What are we analyzing?
We aim to determine which correlation visualization methods are most
effective for small and large datasets.
We will use Heatmaps for a quick overview of
correlations and Scatter Matrix plots for a detailed
examination of relationships between variables.
What the code does:
• Imports the necessary libraries for working with data and visualizations.
library(plotly)
library(ggplot2) # For diamonds
library(dplyr)
What the code does:
• Loads the mtcars dataset.
• Calculates the correlation matrix for all numeric variables.
data("mtcars")
small_data <- mtcars
small_corr <- round(cor(small_data), 2)
What the code does:
• Creates a Heatmap to visualize the correlations.
fig1 <- plot_ly(
data = small_data,
x = colnames(small_corr),
y = colnames(small_corr),
z = small_corr,
type = "heatmap",
colorscale = "Viridis",
text = round(small_corr, 2),
hoverinfo = "x+y+text"
) %>%
layout(title = "Heatmap of Correlation (Small Dataset: mtcars)",
annotations = list(
x = rep(colnames(small_corr), each = nrow(small_corr)),
y = rep(colnames(small_corr), ncol(small_corr)),
text = as.character(round(small_corr, 2)),
showarrow = FALSE,
font = list(size = 12, color = "white")
)
)
fig1
About the plot:
The Heatmap displays the correlations between numeric variables in the mtcars dataset.
• Yellow color: strong positive correlations.
• Purple color: strong negative correlations.
This allows for a quick identification of the strongest and weakest relationships.
What the code does:
Generates a Scatter Matrix for key variables mpg, hp, wt, and qsec in the mtcars dataset.
fig2 <- plot_ly(
data = small_data,
type = "splom",
dimensions = list(
list(label = "mpg", values = ~mpg),
list(label = "hp", values = ~hp),
list(label = "wt", values = ~wt),
list(label = "qsec", values = ~qsec)
)) %>%
layout(title = "Scatter Matrix (Small Dataset: mtcars)")
fig2
About the plot:
The Scatter Matrix visualizes pairwise relationships between key variables in the mtcars dataset, along with their distributions. For example, mpg shows a strong negative correlation with hp and wt.
What the code does:
• Samples 1,000 rows from the diamonds dataset.
• Computes the correlation matrix for numeric variables.
data("diamonds")
large_data <- diamonds %>% sample_n(1000)
large_corr <- large_data %>%
select_if(is.numeric) %>%
cor() %>%
round(2)
What the code does:
• Creates a Heatmap to visualize the correlations.
fig3 <- plot_ly(
x = colnames(large_corr),
y = colnames(large_corr),
z = large_corr,
type = "heatmap",
colorscale = "Viridis",
text = round(large_corr, 2),
hoverinfo = "x+y+text"
) %>%
layout(
title = "Heatmap of Correlation (Large Dataset: diamonds)",
annotations = list(
x = rep(colnames(large_corr), each = nrow(large_corr)),
y = rep(colnames(large_corr), ncol(large_corr)),
text = as.character(round(large_corr, 2)),
showarrow = FALSE,
font = list(size = 12, color = "white")
)
)
fig3
About the plot:
The Heatmap shows correlations between numeric variables in the diamonds subset. Strong positive correlations are visible between carat and size-related variables (x, y, z), highlighted in yellow.
What the code does:
• Generates a Scatter Matrix for all numeric variables in the diamonds dataset sample.
numeric_data <- large_data[sapply(large_data, is.numeric)]
fig4 <- plot_ly(
data = numeric_data,
type = "splom",
dimensions = lapply(names(numeric_data), function(col) {
list(label = col, values = numeric_data[[col]])
})
) %>%
layout(
title = "Scatter Matrix (Large Dataset: diamonds)",
margin = list(b = 50)
)
fig4
About the plot:
The Scatter Matrix for the diamonds dataset sample shows pairwise relationships between numeric variables. For instance, carat has a clear positive linear relationship with x, y, and z.
Key Findings:
1. Heatmap:
• Effective for quickly assessing correlations in both small and large datasets.
• Color gradients make it easy to identify the strongest and weakest relationships.
2. Scatter Matrix:
• More informative for detailed pairwise analysis of variables.
• Suitable for small datasets or selected subsets of variables in large datasets.